Multi-stream language identification using data-driven dependency selection
نویسندگان
چکیده
The most widespread approach to automatic language identification in the past has been the statistical modeling of phone sequences extracted from speech signals. Recently, we have developed an alternative approach to LID based on n-gram modeling of parallel streams of articulatory features, which was shown to have advantages over phone-based systems on short test signals whereas the latter achieved a higher accuracy on longer signals. Additionally, phone and feature streams can be combined to achieve maximum performance. Within this “multi-stream” framework two types of statistical dependencies need to be modeled: (a) dependencies between symbols in individual streams and (b) dependencies between symbols in different streams. The space of possible dependencies is typically too large to be searched exhaustively. In this paper, we explore the use of genetic algorithms as a method for data-driven dependency selection. The result is a general framework for the discovery and modeling of dependencies between multiple information sources expressed as sequences of symbols, which has implications for other fields beyond language identification, such as speaker identification or language modeling.
منابع مشابه
تأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملPronominal Reference Type Identification and Event Anaphora Resolution for Hindi
In this paper, we present hybrid approaches for pronominal reference type (abstract or concrete) identification and event anaphora resolution for Hindi. Pronominal reference type identification is one of the important parts for any anaphora resolution system as it helps anaphora resolver in optimal feature selection based on pronominal reference types. We use language specific rules and feature...
متن کاملGlobal Inference and Learning Algorithms for Multi-Lingual Dependency Parsing
This paper gives an overview of the work of McDonald et al. (McDonald et al. 2005a, 2005b; McDonald and Pereira 2006;McDonald et al. 2006) on global inference and learning algorithms for data-driven dependency parsing. Further details can be found in the thesis of McDonald (McDonald 2006). This paper is primarily intended for the audience of the ESSLLI 2007 course on data-driven dependency pars...
متن کاملThe Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context
The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...
متن کامل